Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support v5litepod-8. #292

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mbobrovskyi
Copy link
Collaborator

@mbobrovskyi mbobrovskyi commented Dec 11, 2024

Fixes / Features

Testing / Documentation

Testing details.

  • [ y/n ] Tests pass
  • [ y/n ] Appropriate changes to documentation are included in the PR

@mbobrovskyi mbobrovskyi force-pushed the mbobrovskyi/support-v5litepod-8 branch from 6de50b9 to 07f0191 Compare December 11, 2024 17:02
@mbobrovskyi mbobrovskyi marked this pull request as ready for review December 11, 2024 17:04
@pawloch00
Copy link
Collaborator

@sharabiani do you think we should test it?

@IrvingMg IrvingMg force-pushed the mbobrovskyi/support-v5litepod-8 branch from a2fc5ef to 2a78d68 Compare December 13, 2024 08:17
@@ -1081,6 +1081,15 @@ def get_system_characteristics(
'v5p-17920',
),
# v5litepod
'v5litepod-8': SystemCharacteristics(
'2x4',
4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Isn't ct5lp-hightpu-8t a SingleHost slice with 1 vm per slice? (ref)
  • Also, here creating a Single-host TPU Slice is explained separately from Multihost. Please make sure xpk node-pool creation/workload flows is compatible with your new SingleHost SystemCharasteristic.

More info on Single/Multi Host

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've changed to ct5lp-hightpu-4t for multihost.

@mbobrovskyi mbobrovskyi force-pushed the mbobrovskyi/support-v5litepod-8 branch from 1fc9bbe to a006f27 Compare December 17, 2024 06:43
@mbobrovskyi mbobrovskyi force-pushed the mbobrovskyi/support-v5litepod-8 branch 2 times, most recently from aadf642 to fe5760f Compare December 17, 2024 11:17
@mbobrovskyi mbobrovskyi changed the title Support v5litepod-8. [WIP] Support v5litepod-8. Dec 17, 2024
@mbobrovskyi mbobrovskyi force-pushed the mbobrovskyi/support-v5litepod-8 branch 15 times, most recently from b987153 to 346ae74 Compare December 20, 2024 08:34
@IrvingMg IrvingMg force-pushed the mbobrovskyi/support-v5litepod-8 branch 10 times, most recently from a3d8fc7 to c410a86 Compare January 7, 2025 16:37
@IrvingMg
Copy link
Collaborator

IrvingMg commented Jan 7, 2025

Is there anyting blocking here?

Is it possible to add a new zone for the pipeline? To test any of tpu-v5-lite-podslice - including v5litepod-8 - we need to do it on zone us-central1-a but current config is on us-central2-b. However, just by passing the new region as param for the commands isn't working on Github Actions.

EDIT: Removing custom args and reservation, and using --on-demand we got: Insufficient quota to satisfy the request: Atomic resize failed with [GCE_QUOTA_EXCEEDED]: Quota 'TPU_LITE_PODSLICE_V5' exceeded. Limit: 0.0 in region us-central1..

@IrvingMg IrvingMg self-assigned this Jan 7, 2025
@IrvingMg IrvingMg force-pushed the mbobrovskyi/support-v5litepod-8 branch 10 times, most recently from 38a9338 to 5c1b251 Compare January 8, 2025 14:56
@IrvingMg IrvingMg force-pushed the mbobrovskyi/support-v5litepod-8 branch 5 times, most recently from b3134c4 to fc176f5 Compare January 9, 2025 07:55
@IrvingMg IrvingMg force-pushed the mbobrovskyi/support-v5litepod-8 branch from fc176f5 to 280d499 Compare January 9, 2025 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants